Joint Learning of CNN and LSTM for Image Captioning
نویسندگان
چکیده
In this paper, we describe the details of our methods for the participation in the subtask of the ImageCLEF 2016 Scalable Image Annotation task: Natural Language Caption Generation. The model we used is the combination of a procedure of encoding and a procedure of decoding, which includes a Convolutional neural network(CNN) and a Long Short-Term Memory(LSTM) based Recurrent Neural Network. We first train a model on the MSCOCO dataset and then fine tune the model on different target datasets collected by us to get a more suitable model for the natural language caption generation task. Both of the parameters of CNN and LSTM are learned together.
منابع مشابه
Stacked RNNs for Encoder-Decoder Networks: Accurate Machine Understanding of Images
We address the image captioning task by combining a convolutional neural network (CNN) with various recurrent neural network architectures. We train the models on over 400,000 training examples ( roughly 80,000 images, with 5 captions per image) from the Microsoft 2014 COCO challenge. We demonstrate that stacking a 2-Layer RNN provides better results on image captioning tasks than both a Vanill...
متن کاملChanges on the Horizon for the Multimedia Community
The Impact of Deep Learning The development of AI algorithms, represented by deep learning, has bolstered multimedia research. In particular, deep learning has led to a multimodality-based algorithm framework, enabling the effective fusion and use of cross-domain data. Take image and video captioning, for example. A couple of years ago, tagging was the only way to describe images and videos. Bu...
متن کاملDeep Poetry: Word-Level and Character-Level Language Models for Shakespearean Sonnet Generation
Text generation is a foundational task in natural language processing, forming the core of a diverse set of practical applications ranging from image captioning and text summarization to question answering. However, most of this work has focused on generating prose. We investigate whether deep learning systems can be used to synthesize poetry, in particular Shakespearean-styled works. Previous ...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملImage Caption Generation with Text-Conditional Semantic Attention
Attention mechanisms have attracted considerable interest in image captioning due to its powerful performance. However, existing methods use only visual content as attention and whether textual context can improve attention in image captioning remains unsolved. To explore this problem, we propose a novel attention mechanism, called textconditional attention, which allows the caption generator t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016